Integrated bioinformatics analysis reveals marker genes and immune infiltration for pulmonary arterial hypertension

Pulmonary arterial hypertension (PAH) is a chronic cardiopulmonary syndrome with high pulmonary vascular load and eventually causing RV heart failure even death. However, the mechanism of pulmonary hypertension remains unclear. The purpose of this research is to detect the underlying key genes and potential mechanism of PAH using several bioinformatic methods. The microarrays GSE22356, GSE131793 and GSE168905 were acquired from the GEO. Subsequently, a host of bioinformatics techniques such as DAVID, STRING, R language and Cytoscape were utilized to investigate DEGs between PAH and healthy controls and conduct GO annotation, KEGG enrichment analysis and PPI network construction etc. Additionally, we predicted the transcription factors regulating DEGs through iRegulon plugin of Cytoscape and CIBERSORT was used to conduct immune infiltration analysis. One thousand two hundred and seventy-seven DEGs (403 up-regulated and 874 down-regulated) were identified from peripheral blood samples of 32 PAH patients and 29 controls, among which SLC4A1, AHSP, ALAS2, CA1, HBD, SNCA, HBM, SELENBP1, SERPINE1 and ITGA2B were detected as hub genes. The functional enrichment changes of DEGs were mainly enriched in protein binding, extracellular exosome, extracellular space, extracellular region and integral component of plasma membrane. The hub genes are chiefly enriched at extracellular exosome, hemoglobin complex, blood microparticle, oxygen transporter activity. Among TF-DEGs network, 42 target DEGs and 6 TFs were predicted with an NES > 4 (TEAD4, TGIF2LY, GATA5, GATA1, GATA2, FOS). Immune infiltration analysis showed that monocytes occupied the largest proportion of immune cells. The trend analysis results of infiltration immune cells illustrated that PAH patients had higher infiltration of NK cell activation, monocyte, T cell CD4 memory activation, and mast cell than healthy controls and lower infiltration of T cell CD4 naive. We detected SLC4A1, AHSP, ALAS2, CA1, HBD, SNCA, HBM, SELENBP1, SERPINE1 and ITGA2B as the most significant markers of PAH. The PAH patients had higher infiltration of NK cell activation, monocyte, T cell CD4 memory activation, and mast cell than healthy controls and lower infiltration of T cell CD4 naive. These identified genes and these immune cells probably have precise regulatory relationships in the development of PAH.


Identification of DEGs in PAH.
There are 580 DEGs screened from GSE22356 between PAH group with healthy group which consists of 162 up-regulated and 418 down-regulated genes ( Fig. 2A,D). Simultaneously, heatmaps and volcano plots vividly showed that 156 up-regulated and 13 down-regulated DEGs were identified from GSE131793, 102 up-regulated and 456 down-regulated DEGs were identified from GSE168905 ( Fig. 2B,C,E,F) (p value < 0.05 and |log2FC|≥ 0.5). Then, we merged up-regulated and down-regulated DEGs in the three datasets, which revealed a total of 403 up-regulated DEGs and 874 down-regulated DEGs in the microarrays.
Functional enrichment analysis of DEGs. For demonstrating PAH-related functional annotation and pathway enrichment, these DEGs with an absolute value of log2 fold change greater than 1 were selected for GO and KEGG analysis and detailed results are presented in Tables 1 and 2. In GO analysis, the richest BP participated hydrogen peroxide catabolic process, positive regulation of leukocyte migration and homeostasis of number of cells (Fig. 3A). In the CC category, platelet alpha granule membrane, haptoglobin-hemoglobin complex and platelet alpha granule were mostly interrelated with PAH (Fig. 3B). Organic acid binding, chemokine binding and haptoglobin binding were most obviously enriched for DEGs in the MF category (Fig. 3C). As we all know, KEGG mapping is a predictive method via reestablishing molecular network systems from molecular constructing blocks grounded on functional orthologs [10][11][12] . In KEGG analysis, the result indicated that genes were mainly associated with hematopoietic cell lineage, TGF-beta signaling pathway and ECM-receptor interaction (Fig. 3D). PPI network construction and hub genes identification. Uploading  www.nature.com/scientificreports/ nodes and 57 edges was visualized via Cytoscape software (Fig. 4A). Then the most remarkable module was recognized by cytoHubba plug-in according to the maximal clique centrality topology analysis methods (Fig. 4B). The top 10 significant nodes with the highest connectivity among the network were appraised as hub genes: SLC4A1, AHSP, ALAS2, CA1, HBD, SNCA, HBM, SELENBP1, SERPINE1, ITGA2B (Table 3). Additionally, the most significantly enriched BPs showed that hub genes were related to oxygen transport, erythrocyte development, hydrogen peroxide catabolic process bicarbonate transport. The changes of CCs containing hemoglobin complex, blood microparticle, haptoglobin-hemoglobin complex and platelet alpha granule membrane. MFs were mainly enriched in hemoglobin binding, haptoglobin binding, oxygen transporter activity and organic acid binding (Fig. 4C).

Analysis of TF-target regulating networks. To gain further insight into the regulatory TFs and selected
DEGs, we used the iRegulon plugin to predict the TFs of DEGs. There were 48 nodes were predicted consisting of 42 DEGs and 6 TFs with an NES > 4 in this TF-DEGs network (TEAD4, TGIF2LY, GATA5, GATA1, GATA2, FOS) (Fig. 5). Specifically, the results show that 36 DEGs were forecasted as targets of TGIF2LY, 15 DEGs were forecasted as targets of TEAD4, GATA2 is predicted to have 17 DEG targets, GATA5 has 25 DEG targets, GATA1 has19 DEGs, and FOS has 23 DEGs 13 . Three predicted TFs belong to the family of GATA zinc finger TFs that bind to various motifs in the promoters of DEGs. It can be speculated that the TFs regulates the development and progression of PAH by activating or repressing the transcription of DEGs (details of transcription factors are shown in Table 4).
Immune cells infiltration analysis. Using CIBERSORT, the relative proportions of 22 infiltration immunocytes in PAH and control samples were determined (Fig. 6A) and details were presented in Table 3. Furthermore, the heatmap showed that the 22 subpopulations of immunocytes were significantly different between PAH and controls and it was found that monocytes occupied the largest proportion of immune cells (Fig. 6B). Correlation analysis discovered that macrophages M had the most obvious positive relationship with both T cells follicular helper and T cells regulatory (Tregs) (r = 0.9, r = 0.9), dendritic cells resting and mast cells activated were also highly positively correlated with r = 0.83, while the strongest negative correlation was between dendritic cell activation and NK cell quiescence with r = − 0.6 ( Fig. 6C). And the trend analysis results of infiltration immune cells illustrated that PAH patients had higher infiltration of NK cell activation, monocyte, T cell CD4 memory activation, and mast cell than healthy controls and lower infiltration of T cell CD4 naive ( Fig. 6D; p < 0.05).

Discussion
PAH is a rare chronic refractory syndrome accompanied with pathological changes of distal pulmonary arteries such as vasculature remodeling, lumen stenosis, laminar intimal proliferation and fibrosis, plexiform lesions with excessive poorly formed capillaries 14 . The current clinical definition of PAH is that the average pulmonary arterial pressure (m PAP) ≥ 25 mm Hg measured during right heart catheterization (1 mmHg = 0.133 kPa) 15 . PAH is divided into many subgroups due to different etiologies worldwide. According to the latest Chinese PAH  www.nature.com/scientificreports/ The heat-maps of differential expression genes of GSE22356, GSE131793 and GSE168905 (P < 0.05, |log FC|> 1). Up-regulated genes were in red and down-regulated genes were in blue. (D-F) Volcano plot of genes identified in PAH of GSE22356, GSE131793 and GSE168905 (P < 0.05, |log FC|> 0.5). Red dots represent up-regulated and green dots represent down-regulated. www.nature.com/scientificreports/ registry, coronary heart disease-related PAH was reported as the most prevalent subtype, accounting for about 43% of all cases 3 . In addition to anticoagulant, calcium channel blockers 16 , diuretics, exercise rehabilitation 17 , oxygen therapy 18 and other basic measures and conventional therapies to reduce right ventricular preload and improve left ventricular filling, the treatment of PAH also includes drugs targeting the prostacyclin pathway, endothelin pathway, nitric oxide-cGMP pathway like epoprostenol, iloprost, treprostinil, bosentan, ambrisentan, cediline, nafil, sildenafil, riociguat etc. 5,19 . Previous study has reported that tyrosine kinase inhibition (TKI) has an inhibitory effect on cell proliferation against pulmonary vascular remodeling 20 . Despite substantial efforts were made in PAH field in recent years, the pathogenesis of PAH remains to be further elucidated. Nearly decades, rapid advances in gene sequencing technologies and bioinformatics related science have made it possible to utilize abundant sequencing data for further analyzing. In this research, we identified 403 up-regulated and 874 down-regulated DEGs from peripheral blood samples belong to 29 controls and 32 PAH patients whose expression profiles downloaded via GEO database. Besides, PPI network construction and functional enrichment analyses were performed to filter key genes and important pathways. GO annotation results manifested that DEGs were enriched in extracellular exosome, integral component of plasma membrane, platelet degranulation, blood microparticle, extracellular space, extracellular region, immune response, cell surface and protein binding. KEGG pathway results declared that DEGs were mainly mapped to hematopoietic cell lineage, cytokine-cytokine receptor interaction, transcription growth factor beta (TGF-β) signaling pathway, extracellular mechanism (ECM) receptor interaction and focal adhesion. Establishing PPI network is friendly for researchers to investigate the underlying molecular mechanism of PAH for the reason that the DEGs would be grouped and ordered in the network judging by their interactions 21 . Of which, 10 hub genes (SLC4A1, AHSP, ALAS2, CA1, HBD, SNCA, HBM, SELENBP1, SERPINE1, ITGA2B) related to PAH were detected owing to Cytoscape. It is speculated that these significant DEGs theoretically lead to the occurrence and progression of PAH. SLC4A1, solute carrier family 4 member 1, a member of the anion exchanger (AE) family, expressed mainly in the erythrocyte plasma membrane, functions as a chloride/ bicarbonate exchanger in the erythrocyte plasma membrane associated with the transport of carbon dioxide from tissues to the lungs 22 . Recently a genome-wide association study revealed that candidate genes including SLC4A1 known to regulate structure of erythrocyte, metabolic process and ion channels 23 . Kaneda et al. used a lung cancer (LC) model mice to examine the proteome of cancer cell-secreted extracellular vesicles (cEVs) and found that SLC4A1 was co-expressed in CHL1-expressing EVs 24 . AHSP, Alpha-hemoglobin-stabilizing protein, is reported as a molecular chaperone, which peculiarly attaches to free α-globin and participates in hemoglobin assembly. This protein links to monomeric alpha-globin while it is transferred to beta-globin to compose a heterodimer, which then conversely unites to another heterodimer to shape a sturdy tetrameric hemoglobin 25 . During normal erythroid development, this protein functions as a chaperone to guard against deleterious aggregation of alpha-hemoglobin, particularly, to defend free alpha-hemoglobin from precipitation. AHSP regulates pathological conditions with excess alpha-hemoglobin like beta-thalassemia 26 . Protein 5′-Aminolevulinate Synthase 2 of ALAS2 is an erythrocyte-specific mitochondrial localization enzyme and that step one of the heme biosynthetic pathway is catalyzed by this product 27 . Blemishes on ALAS2 will lead to the development of X-linked pyridoxineresponsive sideroblastic anemia 28 .

Scientific Reports
The evidence presented thus far supports the idea that carbonic anhydrases (CAs) are zinc metalloenzymes mainly involved in catalyzing the reversible hydration of carbon dioxide 29 . CAs are involved in diverse biological processes, consisting of respiration, calcification, acid-base balance, bone resorption and so on. Studies have confirmed that CA1 gene is tightly related to the CA2 and CA3 genes on chromosome 8 and encrypts the highest levels of cytosolic proteins found in red blood cells 30 . Previous studies said the HBD gene was normally expressed in adults and encodes a delta chain that two delta chains add two alpha chains make up HbA-2, then together with HbF makes up adult hemoglobin. The research to date has tended to display that mutation in the HBD gene give rise to hemoglobin leprosy-beta-thalassemia syndrome and fetal hemoglobin quantitative trait loci 31 . The HBM gene has an open reading frame (ORF) that enciphers a 141 amino acid polypeptide resemble the delta globin unearthed in reptiles and birds. HBD-related pathways include factors involved in megakaryocyte development and thrombopoiesis and responses to elevated platelet cytoplasmic Ca 2+ . Alpha-synuclein, encoded by SNCA, belongs to the synuclein family including beta-and gamma-synuclein as well. Synuclein is plentifully expressed in the cerebrum, α-and β-synuclein selectively inhibits phospholipase D2. Protein SNCA can be used to analyze and integrate presynaptic semaphore and membrane transportation. It has been reported that SNCA peptides are the governing elements of amyloid patches in the brains of victims suffering from Alzheimer 32 . SERPINE1 gene a serine protease that is classified to the serine protease inhibitor (serpin) superfamily and a major inhibitor for tissue plasminogen activator (tPA) and urokinase (uPA). In addition, it is capable of congenial antiviral immunity. Studies have shown that SERPINE1 deficiency is responsible for plasminogen activator inhibitor 1 deficiency (PAI-1), and the high consistence proteins linked to thrombophilia 33 . Selenium Binding Protein 1, encoded by SELENBP1, pertain to the selenium-binding protein family. As we all known selenium is an indispensable trace element that possesses formidable anticarcinogenic characters. Researchers have proved that lack of selenium probably results in some certain neurological diseases. The role of selenium in the precaution of carcinoma and neurological diseases perchance is regulated through selenium-united proteins, and alleviated expression of SELENBP1 presumably is relevant to a few kinds of cancer 34 . This protein might selenium-dependently function in degrading ubiquitination/deubiquitination-regulated protein. SELENBP1 relational disorders include Extraoral Halitosis due to methanethiol oxidase deficiency and methionine adenosyltransferase deficiency 35 . Integrin Subunit Alpha 2b (ITGA2B) is geared to integrin alpha chain proteins family. The preproprotein undergoes proteolytic processing to produce light and heavy chains that binding via disulfide bonds to constitute subunits of the alpha-IIb/beta-3 integrin cell adhesion receptor. This receptor plays a vital character in coagulation system via attracting platelet gathering. Platelet-type bleeding disorders such as Glanzmann thrombasthenia are caused by mutations in ITGA2B manifesting the inability of platelets to aggregate 36  www.nature.com/scientificreports/ www.nature.com/scientificreports/ Furthermore, our current analysis also identified some TFs (TEAD4, TGIF2LY, GATA5, GATA1, GATA2, FOS) related to PAH, insinuating that these DEGs serve an important function in PAH. Next, based on current relevant studies we take a closer look at the association between PAHs and identified TFs. TEAD4, TEA domain transcription factor 4, is important component in Hippo signaling pathway which is involved in controlling organ growth and inhibiting tumor through limiting cell proliferation and facilitating apoptosis. TEAD4 mediates cell proliferation, migration, and epithelial-mesenchymal transition (EMT) by mediating the gene expression of YAP1 and WWTR1/TAZ. Previous study showed that Luteolin ameliorates reconstitution of pulmonary blood vessels and RV hypertrophy on rats PAH model and inhibited smooth muscle cells proliferation and migration at a dose-dependent manner through inhibition mediated by HIPPO-YAP/PI3K/AKT pathway 37 . TGIF2LY, belongs to the TALE/TGIF homeobox family of transcription factors, is located in the specific region of chromosome Y, in a large stretch of sequence which is considered as a X-to-Y transposition. It is possible to exert transcriptional regulation in testis and become a competitor or regulator of gene TGIF2LX. There are no studies to date that TGIF2LY is associated with PAH yet. The GATA family of transcription factors members GATA1, GATA2and GATA5 are associated with the extent of amino acid sequence identity in DNA zinc finger binding domains and they all have the capability to link the GATA sequence. Considering reprogramming function of GATA1/2/5, the three GATA family members are inducers of pluripotency reprogramming. Specifically, as a transcriptional activator or repressor, GATA1 possibly play a general switch role during erythroid development. GATA1 combines to the DNA sites with the common sequence 5′-GATA-3′ inside modulatory districts of globin genes and others genes of red blood cells and motivates genes transcription concerned with erythroid differentiation of K562 erythroleukemia cells, containing HBB, HBG1/2, ALAS2 and HMBS. GATA2 is a transcription factor participating in stem cell sustain, hematopoiesis and mediating endothelin-1 gene expression of endothelial cells. GATA5 is an indispensable TF during cardiovascular development 38 and transcriptional program, which is the basis of diversity of smooth muscle cell. It's reported that GATA5 united in the CEF-1 nucleoprotein binding domain of cardiac-specific slow/cardiac troponin C transcriptional enhance 39 . FOS acts as a nuclear phosphoprotein that composes a close but non-covalently connected complex with JUN/AP-1 transcription factor. www.nature.com/scientificreports/ Both FOS and the JUN/AP-1 basic region appear to interact with symmetrical DNA half-sites in heterodimers. Multimeric SMAD3/SMAD4/JUN/FOS complexes are formed at the AP1/SMAD binding site during TGF-β activation to mediate TGF-β signaling. It has been documented that FOS critically functions in modifying the formation and maintenance of skeletal cell 40 and it is believed FOS plays an significant role in signal switching, cellular proliferation and differentiation. For further investigating functions of immune cells in PAH, CIBERSORT was performed for immune infiltration assays. Results displayed that NK cell activation, monocyte, T cell CD4 memory activation and mast cell infiltration were increased and T cell CD4 naive infiltration was decreased in peripheral blood of PAH patients contrasted with the control group, which may be related to the development and exacerbation of PAH. It is noteworthy that that it was monocyte that was the highest proportion in PAH peripheral blood. Previous researches showed that for coping with hypoxia, pulmonary monocytes could perceive hypoxia, permeate pulmonary arterioles, and accelerate vascular reforming, which suggested that the monocyte lineage functions directly in the morbidity of PAH 41 . NK cells activation is able to identify and eliminate cells infected with viruses or tumor lesions and degranulate perforin-containing granules and granzymes with the participation of FasL and TRAIL molecules 42 . Ormiston et al. demonstrated that NK cells in PAH patients generated a greater amount of matrix metalloproteinase 9, leading to deleterious effects on pulmonary vascular remodeling and www.nature.com/scientificreports/ functional impairment 43 44 . In a clinical study, mast cells were abundantly found in PAH by quantitatively measuring the number and activity of mast cells in blood and urine among 44 PAH patients and 29 healthy individuals 45 . Besides, mast cells were discerned as connective tissue types expressing tryptase and chymotrypsin 46 . The above findings indicated that mast cells were conducive to the vascular pathophysiology of PAH, which is consistent with the findings of our study.

Conclusion
In summary, results in present study manifested that the development of pulmonary hypertension probably is the result of imbalance between pulmonary vascular remodeling and immune micro environment. We discovered that genes SLC4A1, AHSP, ALAS2, CA1, HBD, SNCA, HBM, SELENBP1, SERPINE1, ITGA2B are the most notable markers of PAH. In addition, (TEAD4, TGIF2LY, GATA5, GATA1, GATA2, FOS) are predicted TFs regulating DEGs and may be regarded as potential targets for the prevention of pulmonary vascular restructure. Regulatory NK cell activation, monocyte, T cell CD4 memory activation and mast cell were detected as the most obvious immune cells infiltrating the peripheral blood of PAH. Since a large number of immune cells are clearly changed in PAH, although it is not clear how the immune system affects vascular remodeling, it can be Table 3. Summary of the function of 10 key genes.

SLC4A1
Solute carrier family 4 member 1 PART of the anion exchanger (AE) family, expressed in the erythrocyte plasma membrane, where it functions as a chloride/ bicarbonate exchanger involved in carbon dioxide transport from tissues to lungs;This protein is predominantly dimeric but forms tetramers in the presence of ankyrin. Many mutations in this gene are known in man, and these mutations can lead to two types of disease: destabilization of red cell membrane leading to hereditary spherocytosis, and defective kidney acid secretion leading to distal renal tubular acidosis AHSP Alpha hemoglobin stabilizing protein Encodes a molecular chaperone which binds specifically to free alpha-globin and is involved in hemoglobin assembly. The encoded protein binds to monomeric alpha-globin until it has been transferred to beta-globin to form a heterodimer, which in turn binds to another heterodimer to form the stable tetrameric hemoglobin. Diseases associated with AHSP include Beta-Thalassemia and Thalassemia. GO annotations related to this gene include unfolded protein binding and hemoglobin binding. Acts as a chaperone to prevent the harmful aggregation of alpha-hemoglobin during normal erythroid cell development. Specifically protects free alpha-hemoglobin from precipitation ALAS2 5′-Aminolevulinate synthase 2 An erythrocyte-specific mitochondrial localization enzyme and that step one of the heme biosynthetic pathway is catalyzed by this product; Blemishes on ALAS2 will lead to the development of X-linked pyridoxine-responsive sideroblastic anemia. Diseases associated with ALAS2 include Anemia, Sideroblastic, 1 and Protoporphyria, Erythropoietic, X-Linked. Among its related pathways are Metabolism and Metabolism of porphyrins. Gene Ontology (GO) annotations related to this gene include pyridoxal phosphate binding and glycine binding www.nature.com/scientificreports/ conjectured that immune system goes a long way in this pathological process. The inadequacies of this study are that the insufficient amount of data may lead to results bias and yet we did not carry out trials to verify the results obtained. Further experiments on immune cells can identify targets to perfect immuno-modulatory therapy for PAH patients. To a certain extent, it could be interpreted like this, key genes and TFs are tightly related to the occurrence of PAH and regarded as underlying targets for therapeutic makers as well, however, it is necessary to explore the specific mechanism using some animal and cell experiments.

Materials and methods
Raw data acquisition and preprocessing. In the study, three gene expression profiles GSE22356, GSE131793 and GSE168905 were obtained from GEO (Home-GEO-NCBI (nih.gov)) database. GSE22356 was based on the GPL570 platform including 10 normal samples and 10 PAH samples. GSE131793 was based on the GPL6244 platform involving 10 healthy samples and 10 PAH samples. GSE168905 was based on the GPL16791 platform containing 9 control samples and 12 PAH samples. Raw data of these profiles were annotated according to their respective platform files following by probes switched into gene symbols, respectively 47 . The software Perl and R (version 4.0.2) were employed for data preprocessing.

Identification of DEGs.
For the purpose of identifying the significant DEGs between the PAH and the healthy group, Limma (linear models for microarray data) package was adopted processing three datasets with |log2 fold change (FC)|≥ 1 and adjusted p value < 0.05 as the thresholds 21 . We obtained the DEGs of the three datasets respectively. Additionally, volcano plots and heatmaps were generated to evaluate these selected DEGs.
Functional and pathway enrichment analysis. To assess the biological functions of the total DEGs, we combined the DEGs for gene ontology (GO) annotation and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis applying online website DAVID (https:// david. ncifc rf. gov/ tools. jsp), which classifies

Protein-protein interaction (PPI) network and hub gene identification. PPI network among all
DEGs was established based on an online tool STRING (https:// cn. string-db. org/) and then software Cytoscape (version 3.7.2) was employed to adjust and visualize PPI networks 49 . Subsequently, we utilized a plug-in Cytohubba of Cytoscape to determine top 10 hub genes according to the MCC algorithms. All key genes' scores were over 15 degrees.

Construction of TF-DEG regulation network. For searching potential interactions between DEGs and
transcription factors (TFs), another plugin iRegulon in Cytoscape used for the identification of DEG-targeted TFs. The enriched motifs in iRegulon were ranked depending on the direct targets by means of position weight matrix 50 . The TF-DEG crosstalk pairs were gained from databases such as TRANSFAC, TRED and so on. Then the TF-DEG regulatory network was visualized by Cytoscape.
Immune cell infiltration analysis. Numerous studies have shown that PAH has a certain relationship with the immune micro environment, so we performed this analysis. The corrected gene matrix was uploaded to CIBERSORT website (CIBERSORT (stanford.edu)), a powerful online immune assay calculation tool, via a deconvolution algorithm to calculate the percentages of 22 kinds of infiltrating immune cells with P < 0.05 as standard 51 . Hereafter, the results of deduced immunocytes infiltration analysis were visualized by R language in different manners (Spplemetary Table 1). a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is preferentially expressed in the skeletal muscle, and binds to the M-CAT regulatory element found in promoters of muscle-specific genes to direct their gene expression. Alternatively spliced transcripts encoding distinct isoforms, some of which are translated through the use of a non-AUG (UUG) initiation codon TGIF2LY TGFB induced factor homeobox 2 like y-linked a member of the TALE/TGIF homeobox family of transcription factors. This gene lies within the male specific region of chromosome Y, in a block of sequence that is thought to be the result of a large X-to-Y transposition. The C-terminus of this protein is divergent from that of its chromosome X homolog (TGI-F2LX), suggesting that this protein may act as a regulator of TGIF2LX GATA5 GATA binding protein 5 a transcription factor that contains two GATA-type zinc fingers. The encoded protein is known to bind to hepatocyte nuclear factor-1alpha (HNF-1alpha), and this interaction is essential for cooperative activation of the intestinal lactase-phlorizin hydrolase promoter. In other organisms, similar proteins may be involved in the establishment of cardiac smooth muscle cell diversity